Developing Indicators of Habitat and Ecosystem Change in the Gulf of Maine

Study area

The focus area for this project is coastal Maine, extending from the coast to the eastern boundary of the NOAA statistical areas 511, 512, 513.

usStates <- rnaturalearth::ne_states("united states of america", returnclass = "sf")
ne_us <- usStates %>% filter(name == "Maine")
statarea <- sf::st_read(paste0(gmRi::shared.path(group = "Res_Data", folder = "Shapefiles/Statistical_Areas"), "Statistical_Areas_2010_withnames.shp"), quiet = TRUE) %>% 
  filter(Id %in% c(511, 512, 513)) %>% 
  mutate(Id = as.factor(Id))
statarea_sf <- sf::st_simplify(statarea, dTolerance = .05)
mcc_turnoff_sf <- sf::st_read(here::here("Data/Shapefiles/MCC_turnoff/MCC_turnoff.shp"), quiet = TRUE)

ggplot() +  
  geom_sf(data = statarea_sf, aes(fill = Id)) + 
  geom_sf(data= ne_us, fill = "grey") +
  geom_sf(data = mcc_turnoff_sf, fill = "transparent", color = "black") +
  scale_fill_discrete(name = "Stat area") +
  theme(panel.background = element_blank(), 
        panel.grid = element_blank(), 
        axis.title = element_blank(),
        axis.ticks = element_blank())
Study region. NOAA statistical areas are indicated as colored polygons. Maine coastal current study region indicated by the black outlined open polygon

Study region. NOAA statistical areas are indicated as colored polygons. Maine coastal current study region indicated by the black outlined open polygon

Indicators

The following indicators are used in this report

  • Surface and bottom temperature anomalies (FVCOM, OISST)
  • Surface and bottom salinity anomalies (FVCOM)
  • Maine Coastal Current Index (FVCOM)
  • Species-based lobster predator index (NOAA Trawl Survey)
  • Size-based lobster predator index (NOAA Trawl Survey)
  • Continuous Plankton Recordings First Mode (small zooplankton)
  • Continuous Plankton Recordings Second Mode (Calanus)
# Indicators
allIndicators <- read_csv(here::here("Processed_Indicators/allIndicators.csv"))
allIndicators$Season <- factor(allIndicators$Season, levels= c("spring", "summer", "fall", "winter", "all"))
allIndicators$stat_area <- factor(allIndicators$stat_area, levels= c("511", "512", "513", "511-513"))
indicators <- unique(allIndicators$Indicator)

Indicators PCA

Previous analysis shows surface and bottom temperature and surface and bottom salinity load similarly in a PCA. To reduce variables only bottom temperature and bottom salinity are used in the following PCA.

  • Data: Bottom temp, bottom salinity, MCC, species based predator index, size based predator index, small zooplankton index, Calanus index
  • Method: Principal components analysis
  • Years: 1990-2016

PCA Summary

indicator_list <- c("fvcom_bt", "fvcom_bs","mcc", "nefsc_abundance", "nefsc_size_spectra_slope", "cpr_FirstMode", "cpr_SecondMode")

allIndex <- allIndicators %>% 
  filter(Season == "all", 
         stat_area == "511-513",
         Indicator %in% indicator_list) %>% 
  select(-Season, -stat_area) %>% 
  pivot_wider(names_from = Indicator, values_from = Value) %>% 
  na.omit()

### Composite PCA (combined stat areas)
indicator_pca <- allIndex %>% column_to_rownames(., var = "Year") %>% 
  prcomp(scale. = TRUE, center = TRUE)

index_pca <- tibble("PC1" = indicator_pca$x[,1],
                    "PC2" = indicator_pca$x[,2],
                    "PC3" = indicator_pca$x[,3])
index_pca["Year"] <- allIndex$Year

pc_importance <- summary(indicator_pca)$importance[, 1:5]

knitr::kable(pc_importance)
PC1 PC2 PC3 PC4 PC5
Standard deviation 1.553006 1.317107 1.005198 0.8982436 0.7340564
Proportion of Variance 0.344550 0.247820 0.144350 0.1152600 0.0769800
Cumulative Proportion 0.344550 0.592370 0.736720 0.8519800 0.9289600
scree1 <- fviz_eig(indicator_pca, ylab = "% variance explained") + scale_y_continuous(limits = c(0,60))

#scree1 / scree2 / scree3

PC time series

Below are the time series for PC1, PC2, and PC3.

PC1plot <- index_pca %>% 
  ggplot() + 
  geom_line(aes(Year, PC1))

PC2plot <- index_pca %>% 
  ggplot() + 
  geom_line(aes(Year, PC2))

PC3plot <- index_pca %>% 
  ggplot() + 
  geom_line(aes(Year, PC3))


PC1plot / PC2plot / PC3plot

### Maine composite

PC1plot <- index_pca %>% 
  ggplot() + 
  geom_line(aes(Year, PC1))

PC2plot <- index_pca %>% 
  ggplot() + 
  geom_line(aes(Year, PC2))

PC3plot <- index_pca %>% 
  ggplot() + 
  geom_line(aes(Year, PC3))

PC1plot / PC2plot / PC3plot

Loadings plot

The loadings plots show how each indicator is related to the resulting principal components.

Figure 4. Loadings of the variables

Figure 4. Loadings of the variables

Indicator PC1 PC2 PC3
fvcom_bt 0.447 -0.075 0.543
fvcom_bs 0.449 -0.436 0.043
mcc -0.347 0.476 0.266
nefsc_abundance 0.489 0.325 0.300
nefsc_size_spectra_slope -0.338 -0.107 0.686
cpr_FirstMode -0.193 -0.624 0.031
cpr_SecondMode -0.296 -0.268 0.266
Figure 4. Loadings of the variables

Figure 4. Loadings of the variables

Chronolgical cluster

The cluster plot groups years that are most similar based on PC1 and PC2. For this analysis stat areas were grouped together.

clusfun <- function(x){
  wss <- (nrow(x)-1)*sum(apply(x,2,var)) # get sum of squares

  for (i in 2:12) wss[i] <- sum(kmeans(x,
     centers=i)$withinss)
  
  return(wss)
}

allIndex_ca <- allIndex %>% 
  column_to_rownames(., var = "Year") %>% scale() %>% clusfun()

# look for break in plot like a scree plot

kmeans_scree <- ggplot() + 
  geom_line(aes(x = c(1:12), y = allIndex_composite)) +
  labs(x = "Number of Clusters",
  y ="Within groups sum of squares")

# K-Means Cluster Analysis
fit <- allIndex %>% 
  column_to_rownames(., var = "Year") %>% 
  scale() %>% 
  kmeans(3)

fit_df <- allIndex %>% 
  column_to_rownames(., var = "Year") 

fviz_cluster(object = fit, data = fit_df) + 
  theme_bw() 

Breakpoint Analysis

The breakpoint analysis estimates a change in the linear relationship in the data. The location of the break indicates there may be a difference in the relationship of the variable before and after that point. We find that breakpoints change depending on the variable.

Breakpoint of PC1 and PC2
# breapoint function
bp_analysis <- function(x, Npsi){
  mod <-  lm(values ~ Year, data = x)
  o <- tryCatch(segmented::segmented(mod, seg.Z = ~Year, npsi = Npsi),  # need to estimate bp
                error = function(cond){cond})
}

pscore_fun <- function(x, nbreak){
  lm1 <- lm(values ~ Year, data = x)
  pscore <- segmented::pscore.test(lm1, n.break = nbreak)
  return(pscore)
}

davies_fun <- function(x){
  lm1 <- lm(values ~ Year, data = x)
  davies <- segmented::davies.test(lm1)
  return(davies)
}

mean_bp_fun <- function(x, models){
  temp_mcp1 <- mcp::mcp(models, data = x, par_x = "Year")
  return(temp_mcp1)
}

# Breakpoint by stat area
df <- index_pca %>%
  select(Year, PC1, PC2, PC3)  %>%
  pivot_longer(cols = c(PC1, PC2, PC3), 
               names_to = "Indicator", values_to = "values") %>% 
  filter(Indicator == "PC1")

lm1 <- lm(values ~ Year, data = df)
summary(lm1)
## 
## Call:
## lm(formula = values ~ Year, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8924 -1.1454  0.1394  1.1722  3.0690 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -95.09246   49.36931  -1.926   0.0633 .
## Year          0.04756    0.02469   1.926   0.0633 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.491 on 31 degrees of freedom
## Multiple R-squared:  0.1069, Adjusted R-squared:  0.07808 
## F-statistic:  3.71 on 1 and 31 DF,  p-value: 0.0633
pscore <- pscore_fun(df,2)
pscore
## 
##  Score test for one/two changes in the slope
## 
## data:  formula = values ~ Year 
## breakpoint for variable = Year 
## model = gaussian , link = identity , method = lm
## observed value = 12.599, n.points = 10, p-value = 0.001837
## alternative hypothesis: two.sided   (2 breakpoints)
bp <- bp_analysis(df, 2)
summary(bp)
## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = mod, seg.Z = ~Year, npsi = Npsi)
## 
## Estimated Break-Point(s):
##                Est. St.Err
## psi1.Year 2000.000   2.48
## psi2.Year 2004.546   1.20
## 
## Meaningful coefficients of the linear terms:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -38.33431   79.29214  -0.483    0.633
## Year          0.01918    0.03983   0.482    0.634
## U1.Year      -0.53777    0.46011  -1.169       NA
## U2.Year       1.01691    0.46633   2.181       NA
## 
## Residual standard error: 1.025 on 27 degrees of freedom
## Multiple R-Squared: 0.6325,  Adjusted R-squared: 0.5644 
## 
## Convergence attained in 5 iter. (rel. change 3.4717e-06)
plot(bp)
points(x = df$Year, y = df$values)

mean_bp <- mean_bp_fun(df, list(values~1, 1~1, 1~1))
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 33
##    Unobserved stochastic nodes: 6
##    Total graph size: 589
## 
## Initializing model
mean_bp
## Family: gaussian(link = 'identity')
## Iterations: 9000 from 3 chains.
## Segments:
##   1: values ~ 1
##   2: values ~ 1 ~ 1
##   3: values ~ 1 ~ 1
## 
## Population-level parameters:
##     name    mean   lower   upper Rhat n.eff
##     cp_1 2002.47 1993.21 2011.98    1   259
##     cp_2 2010.57 2009.00 2013.23    1   324
##    int_1   -0.27   -0.81    0.28    1   788
##    int_2   -0.79   -2.34    3.14    1   170
##    int_3    2.36    1.42    3.25    1  4019
##  sigma_1    1.00    0.73    1.29    1  1655
plot(mean_bp)

df <- index_pca %>%
  select(Year, PC1, PC2, PC3)  %>%
  pivot_longer(cols = c(PC1, PC2, PC3), 
               names_to = "Indicator", values_to = "values")  %>% 
  filter(Indicator == "PC2")

lm1 <- lm(values ~ Year, data = df)
summary(lm1)
## 
## Call:
## lm(formula = values ~ Year, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8033 -0.8688 -0.2395  0.6052  2.1867 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -104.70840   40.11542   -2.61   0.0138 *
## Year           0.05237    0.02006    2.61   0.0138 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.212 on 31 degrees of freedom
## Multiple R-squared:  0.1802, Adjusted R-squared:  0.1537 
## F-statistic: 6.813 on 1 and 31 DF,  p-value: 0.01381
pscore <- pscore_fun(df, 1)
pscore
## 
##  Score test for one/two changes in the slope
## 
## data:  formula = values ~ Year 
## breakpoint for variable = Year 
## model = gaussian , link = identity , method = lm
## observed value = -2.8566, n.points = 10, p-value = 0.007579
## alternative hypothesis: two.sided   (1 breakpoint)
bp <- bp_analysis(df, 1)
summary(bp)
## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = mod, seg.Z = ~Year, npsi = Npsi)
## 
## Estimated Break-Point(s):
##                Est. St.Err
## psi1.Year 1996.884  2.372
## 
## Meaningful coefficients of the linear terms:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -423.14013  104.67352  -4.042 0.000356 ***
## Year           0.21248    0.05265   4.036 0.000363 ***
## U1.Year       -0.28410    0.06439  -4.412       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.956 on 29 degrees of freedom
## Multiple R-Squared: 0.5226,  Adjusted R-squared: 0.4732 
## 
## Convergence attained in 3 iter. (rel. change 0)
plot(bp)
points(x = df$Year, y = df$values)

mean_bp <- mean_bp_fun(df, list(values~1, 1~1, 1~1))
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 33
##    Unobserved stochastic nodes: 6
##    Total graph size: 589
## 
## Initializing model
mean_bp
## Family: gaussian(link = 'identity')
## Iterations: 9000 from 3 chains.
## Segments:
##   1: values ~ 1
##   2: values ~ 1 ~ 1
##   3: values ~ 1 ~ 1
## 
## Population-level parameters:
##     name     mean   lower   upper Rhat n.eff
##     cp_1 1990.153 1988.00 1992.10    1  5724
##     cp_2 2001.406 2000.00 2002.41    1  1150
##    int_1   -1.720   -2.18   -1.23    1  5498
##    int_2    1.330    0.85    1.79    1  4545
##    int_3    0.045   -0.32    0.41    1  3666
##  sigma_1    0.697    0.53    0.89    1  2787
plot(mean_bp)

df <- index_pca %>%
  select(Year, PC1, PC2, PC3)  %>%
  pivot_longer(cols = c(PC1, PC2, PC3), 
               names_to = "Indicator", values_to = "values")  %>% 
  filter(Indicator == "PC3")

lm1 <- lm(values ~ Year, data = df)
summary(lm1)
## 
## Call:
## lm(formula = values ~ Year, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.01310 -0.41492 -0.07562  0.43664  2.43678 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -46.42975   32.76848  -1.417    0.166
## Year          0.02322    0.01639   1.417    0.166
## 
## Residual standard error: 0.9897 on 31 degrees of freedom
## Multiple R-squared:  0.06082,    Adjusted R-squared:  0.03053 
## F-statistic: 2.008 on 1 and 31 DF,  p-value: 0.1665
pscore <- pscore_fun(df, 1)
pscore
## 
##  Score test for one/two changes in the slope
## 
## data:  formula = values ~ Year 
## breakpoint for variable = Year 
## model = gaussian , link = identity , method = lm
## observed value = -1.1554, n.points = 10, p-value = 0.2568
## alternative hypothesis: two.sided   (1 breakpoint)
bp <- bp_analysis(df, 2)
summary(bp)
## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = mod, seg.Z = ~Year, npsi = Npsi)
## 
## Estimated Break-Point(s):
##                Est. St.Err
## psi1.Year 1992.999  3.865
## psi2.Year 2001.582  2.639
## 
## Meaningful coefficients of the linear terms:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  92.99652  194.28718   0.479    0.636
## Year         -0.04708    0.09786  -0.481    0.634
## U1.Year       0.25792    0.15576   1.656       NA
## U2.Year      -0.30268    0.13353  -2.267       NA
## 
## Residual standard error: 0.9387 on 27 degrees of freedom
## Multiple R-Squared: 0.2643,  Adjusted R-squared: 0.128 
## 
## Convergence attained in 6 iter. (rel. change 4.9673e-06)
plot(bp)
points(x = df$Year, y = df$values)

mean_bp <- mean_bp_fun(df, list(values~1, 1~1))
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 33
##    Unobserved stochastic nodes: 4
##    Total graph size: 416
## 
## Initializing model
mean_bp
## Family: gaussian(link = 'identity')
## Iterations: 9000 from 3 chains.
## Segments:
##   1: values ~ 1
##   2: values ~ 1 ~ 1
## 
## Population-level parameters:
##     name    mean   lower   upper Rhat n.eff
##     cp_1 1997.56 1984.24 2015.99    1   891
##    int_1   -0.36   -1.08    0.32    1  1787
##    int_2    0.19   -0.72    0.87    1  1103
##  sigma_1    0.99    0.75    1.25    1  3245
plot(mean_bp)

Lobster Data

all_lob_data <- read_csv(here::here("Processed_Indicators/all_lob_data.csv")) 
lob_breakpoints <- read_csv(here::here("Processed_Indicators/lob_breakpoints.csv"))
unique(lob_breakpoints$name)
## [1] "nefsc_biomass"   "nefsc_abundance" "NEFSCindex"      "MEindex"        
## [5] "ME_landings"     "menh_abundance"  "menh_biomass"    "sublegal_cpue"  
## [9] "ALSI"

Biological data analysis

xyPlot <- function(lob_name){
  
  x <- all_lob_data %>% 
    filter(name == lob_name)
  
  x %>% 
    left_join(index_pca) %>% 
    ggplot() +
    geom_point(aes(PC1, lob_index, color = Year)) + 
    geom_smooth(aes(PC1, lob_index, color = Year), method = "gam", se =FALSE) +
    facet_wrap(~stat_area+Sex, scales = "free_y") +
    scale_color_viridis_c()
}


cor_fun <- function(lob_name){
  all_lob_data  %>% 
    filter(name %in% lob_name) %>% 
    left_join(index_pca, by = c("Year")) %>% 
    group_by(name, stat_area, Sex, Season) %>% 
    summarise(corPC1 = corrr::correlate(lob_index, PC1)[[2]],
              corPC2 = corrr::correlate(lob_index, PC2)[[2]],
              corPC3 = corrr::correlate(lob_index, PC3)[[2]])
}

stepwiseAIC <- function(lob_name){
  
  x <- all_lob_data %>% 
    filter(name == lob_name)
  
  x %>% 
    left_join(index_pca) %>% 
    select(Year, stat_area, lob_index, PC1, PC2, PC3, name, Sex, Season) %>% 
    group_by(stat_area, name, Sex, Season) %>% 
    nest() %>% 
    mutate(data = map(data, ~na.omit(.x)),
           aic = purrr::map(data, ~MASS::stepAIC(object = lm(lob_index ~ PC1 + PC2 + PC3 + Year, data = .x), direction = "both", trace = 0)),
           stp = purrr::map(aic, broom::glance),
           index = purrr::map(aic, ~as.character(.x$call$formula)),
           model = paste(index[[1]][[2]], index[[1]][[1]], index[[1]][[3]])) %>% 
    select(stat_area, stp, model, name) %>% 
    unnest(c(stp, model)) %>% 
    select(name, stat_area, Sex, Season, r.squared, p.value, model)
}


aicModelGAM <- function(lob_name, k){
  
  breaks <- lob_breakpoints %>% 
    filter(name == lob_name)
  
  df <- all_lob_data %>% 
    filter(name == lob_name)
  
  df <- left_join(breaks, df, by = c("stat_area", "Season", "name", "Sex"))
    
  df$breakpoint1[is.na(df$breakpoint1)] <- 3000
  
  df$breakpoint2[is.na(df$breakpoint2)] <- 3000
  
  df$breakpoint3[is.na(df$breakpoint3)] <- 3000
  
  df <- df %>% rowwise() %>% 
    mutate(Period1 = Year < breakpoint1,
           Period2 = Year %in% seq(from = round(breakpoint1), to = round(breakpoint2), 1),
           Period3 = Year %in% seq(from = round(breakpoint2), to = round(breakpoint3), 1),
           Period4 = Year > breakpoint3) %>% 
    mutate(Period = if_else(Period1, "one", 
                            if_else(Period2, "two", 
                                    if_else(Period3, "three", "four")))) %>% 
    select(-Period1, -Period2, -Period3, -Period4)
  
  df$Period <- factor(df$Period, levels = c("one", "two", "three", "four"))
  
  gams <- df %>% 
    left_join(index_pca) %>% 
    group_by(stat_area, name, Season, Sex, breakpoint1, breakpoint2, breakpoint3) %>% 
    nest() %>% 
    mutate(data = map(data, ~na.omit(.x)),
           gam1 = purrr::map(data, ~mgcv::gam(lob_index ~ s(Year, by = Period, k=4),
                                              data = .x, select = TRUE, method="REML")),
           gam2 = purrr::map(data, ~mgcv::gam(lob_index ~ PC1 + s(Year, by = Period, k=4),
                                              data = .x, select = TRUE, method="REML")),
           gam3 = purrr::map(data, ~mgcv::gam(lob_index ~ PC1 + PC2 + s(Year, by = Period, k=4),
                                              data = .x, select = TRUE, method="REML")),
           gam4 = purrr::map(data, ~mgcv::gam(lob_index ~ PC1 + PC2 + PC3 + s(Year, by = Period, k=4),
                                              data = .x, select = TRUE, method="REML")))
  return(gams)
}



aicModelGAM2 <- function(lob_name, k){
  
  breaks <- lob_breakpoints %>% 
    filter(name == lob_name)
  
  df <- all_lob_data %>% 
    filter(name == lob_name)
  
  df <- left_join(breaks, df, by = c("stat_area", "Season", "name", "Sex"))
    
  df$breakpoint1[is.na(df$breakpoint1)] <- 3000
  
  df$breakpoint2[is.na(df$breakpoint2)] <- 3000
  
  df$breakpoint3[is.na(df$breakpoint3)] <- 3000
  
  df <- df %>% rowwise() %>% 
    mutate(Period1 = Year < breakpoint1,
           Period2 = Year %in% seq(from = round(breakpoint1), to = round(breakpoint2), 1),
           Period3 = Year %in% seq(from = round(breakpoint2), to = round(breakpoint3), 1),
           Period4 = Year > breakpoint3) %>% 
    mutate(Period = if_else(Period1, "one", 
                            if_else(Period2, "two", 
                                    if_else(Period3, "three", "four")))) %>% 
    select(-Period1, -Period2, -Period3, -Period4)
  
  df$Period <- factor(df$Period, levels = c("one", "two", "three", "four"))
  
  gams <- df %>% 
    left_join(index_pca) %>% 
    group_by(stat_area, name, Season, Sex, breakpoint1, breakpoint2, breakpoint3) %>% 
    nest() %>% 
    mutate(data = map(data, ~na.omit(.x)),
           gam1 = purrr::map(data, ~mgcv::gam(lob_index ~ s(Year, bs="re", k=4),
                                              data = .x, method="REML")),
           gam2 = purrr::map(data, ~mgcv::gam(lob_index ~ s(PC1, by = Period, k=4) + 
                                                s(Year, bs="re", k=4),
                                              data = .x, method="REML")),
           gam3 = purrr::map(data, ~tryCatch(mgcv::gam(lob_index ~ s(PC1, by = Period, k=4) + 
                                                s(PC2, by = Period, k=4) + 
                                                s(Year, bs="re", k=4),
                                              data = .x, method="REML"), error = function(cond){cond})))
  return(gams)
}


gamInfo <- function(df, stat_area = "511-513", season = "all", sex = "M+F"){

  summary(df1$gams[[1]])
}

aicModel_indicators <- function(x){
  
      if(is.character(x$stat_area)){
      x <- select(x, -stat_area)
      }
  
  x %>% 
    left_join(index_pca) %>% 
    na.omit() %>% 
    select(stat_area, lob_index, bot_temp, bot_sal, MCC, predators, FirstMode, SecondMode, name) %>% 
    group_by(stat_area, name) %>% 
    nest() %>% 
    mutate(aic = purrr::map(data, ~MASS::stepAIC(object = lm(lob_index ~ bot_temp + bot_sal + MCC + predators + FirstMode + SecondMode, data = .x), direction = "both", trace = 0)),
           stp = purrr::map(aic, broom::glance),
           index = purrr::map(aic, ~as.character(.x$call$formula)),
           model = paste(index[[1]][[2]], index[[1]][[1]], index[[1]][[3]])) %>% 
    select(stat_area, stp, model, name) %>% 
    unnest(c(stp, model))
}

ALSI index

Correlation table

cor_fun("ALSI") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
ALSI 511 M+F all -0.4826965 0.0056847 0.0319171
ALSI 511-513 M+F all -0.6888378 -0.2054076 0.0341850
ALSI 512 M+F all -0.3929401 -0.2609212 -0.1615003
ALSI 513 M+F all -0.7817008 -0.2702854 0.1527823

Scatter plot

xyPlot("ALSI")

AIC lm model selection table

stepwiseAIC("ALSI")
## # A tibble: 4 x 7
## # Groups:   stat_area, name, Sex, Season [4]
##   name  stat_area Sex   Season r.squared  p.value model                
##   <chr> <chr>     <chr> <chr>      <dbl>    <dbl> <chr>                
## 1 ALSI  511       M+F   all        0.233 0.0684   lob_index ~ PC1      
## 2 ALSI  512       M+F   all        0.154 0.119    lob_index ~ PC1      
## 3 ALSI  513       M+F   all        0.702 0.000209 lob_index ~ PC1 + PC2
## 4 ALSI  511-513   M+F   all        0.474 0.00223  lob_index ~ PC1
aicSelGAM <- aicModelGAM("ALSI", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 4.712916 7.303213
## aicSelGAM$gam2[[1]] 4.353718 7.562130
## aicSelGAM$gam3[[1]] 4.646920 7.723401
## aicSelGAM$gam4[[1]] 5.626361 9.628833
summary(aicSelGAM$gam1[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.7704     0.0969   7.951 1.65e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                      edf Ref.df     F p-value   
## s(Year):Periodone 1.3198      3 1.313 0.08285 . 
## s(Year):Periodtwo 0.9266      3 4.205 0.00196 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =    0.5   Deviance explained =   57%
## -REML = 4.7053  Scale est. = 0.063875  n = 17
aicSelGAM <- aicModelGAM2("ALSI", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]])
##                           df       AIC
## aicSelGAM$gam1[[1]] 2.851683 15.636166
## aicSelGAM$gam2[[1]] 4.000026  9.001970
## aicSelGAM$gam3[[1]] 6.797533  5.488278
summary(aicSelGAM$gam3[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(PC1, by = Period, k = 4) + s(PC2, by = Period, 
##     k = 4) + s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   -28.35      26.14  -1.085      0.3
## 
## Approximate significance of smooth terms:
##                   edf Ref.df      F p-value   
## s(PC1):Periodone 1.00      1  8.509 0.01317 * 
## s(PC1):Periodtwo 1.00      1 12.293 0.00438 **
## s(PC2):Periodone 1.00      1  0.439 0.52067   
## s(PC2):Periodtwo 1.00      1  7.249 0.01996 * 
## s(Year)          0.55      1  1.222 0.16279   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.577   Deviance explained = 69.7%
## -REML = 5.2725  Scale est. = 0.053959  n = 17

Subleagal index

Correlation table

cor_fun("sublegal_cpue") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
sublegal_cpue 511 M+F all 0.6298765 0.0853711 0.1072546
sublegal_cpue 511-513 M+F all 0.7159586 0.1290269 -0.0519357
sublegal_cpue 512 M+F all 0.7218666 0.1436987 -0.1101526
sublegal_cpue 513 M+F all 0.7145212 0.1199746 -0.0420219

Scatter plot

xyPlot("sublegal_cpue")

AIC lm model selection table

stepwiseAIC("sublegal_cpue")
## # A tibble: 4 x 7
## # Groups:   stat_area, name, Sex, Season [4]
##   name          stat_area Sex   Season r.squared  p.value model           
##   <chr>         <chr>     <chr> <chr>      <dbl>    <dbl> <chr>           
## 1 sublegal_cpue 511       M+F   all        0.582 0.00631  lob_index ~ Year
## 2 sublegal_cpue 512       M+F   all        0.822 0.000119 lob_index ~ Year
## 3 sublegal_cpue 513       M+F   all        0.717 0.00102  lob_index ~ Year
## 4 sublegal_cpue 511-513   M+F   all        0.776 0.000339 lob_index ~ Year
aicSelGAM <- aicModelGAM("sublegal_cpue", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 4.238170 188.0194
## aicSelGAM$gam2[[1]] 5.992841 170.4609
## aicSelGAM$gam3[[1]] 6.998561 162.6453
## aicSelGAM$gam4[[1]] 7.999257 156.8306
summary(aicSelGAM$gam4[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ PC1 + PC2 + PC3 + s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2113.93     143.51  14.731 0.000118 ***
## PC1           728.34     113.63   6.409 0.002968 ** 
## PC2          -412.75     196.46  -2.101 0.103030    
## PC3           161.89      77.49   2.089 0.104408    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                    edf Ref.df     F  p-value    
## s(Year):Periodone 2.97      3 110.8 2.68e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.989   Deviance explained = 99.5%
## -REML =  61.82  Scale est. = 58060     n = 11
aicSelGAM <- aicModelGAM2("sublegal_cpue", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 2.998975 189.8256
## aicSelGAM$gam2[[1]] 3.419227 198.5814
## aicSelGAM$gam3[[1]] 4.149846 197.8310
summary(aicSelGAM$gam1[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1175914     214337  -5.486 0.000382 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F  p-value    
## s(Year) 0.968      1 30.24 0.000203 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.751   Deviance explained = 77.6%
## -REML = 87.463  Scale est. = 1.2909e+06  n = 11

ME yearly landings

Correlation table

cor_fun("ME_landings") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
ME_landings 511-513 M+F all 0.5496218 0.2654963 0.2174014

Scatter plot

xyPlot("ME_landings")

AIC lm model selection table

stepwiseAIC("ME_landings")
## # A tibble: 1 x 7
## # Groups:   stat_area, name, Sex, Season [1]
##   name        stat_area Sex   Season r.squared  p.value model                   
##   <chr>       <chr>     <chr> <chr>      <dbl>    <dbl> <chr>                   
## 1 ME_landings 511-513   M+F   all        0.950 6.03e-19 lob_index ~ PC1 + PC2 +…
aicSelGAM <- aicModelGAM("ME_landings", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                            df      AIC
## aicSelGAM$gam1[[1]]  7.979102 1113.894
## aicSelGAM$gam2[[1]]  8.955785 1114.121
## aicSelGAM$gam3[[1]]  9.939809 1114.808
## aicSelGAM$gam4[[1]] 10.951614 1113.487
summary(aicSelGAM$gam4[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ PC1 + PC2 + PC3 + s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 51730418    1324855  39.046   <2e-16 ***
## PC1          1351734     934577   1.446    0.161    
## PC2          -575474     863557  -0.666    0.512    
## PC3          1389735     872052   1.594    0.124    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df      F  p-value    
## s(Year):Periodone   1.8727      3  38.75 1.93e-14 ***
## s(Year):Periodtwo   0.9950      3  64.77  < 2e-16 ***
## s(Year):Periodthree 1.8078      3  80.71  < 2e-16 ***
## s(Year):Periodfour  0.9976      3 128.59  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.986   Deviance explained =   99%
## -REML = 505.49  Scale est. = 1.9227e+13  n = 33
aicSelGAM <- aicModelGAM2("ME_landings", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]])
##                            df      AIC
## aicSelGAM$gam1[[1]]  2.999978 1180.428
## aicSelGAM$gam2[[1]] 11.240635 1130.078
## aicSelGAM$gam3[[1]] 14.268073 1119.021
summary(aicSelGAM$gam2[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(PC1, by = Period, k = 4) + s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.608e+09  4.207e+08  -13.33 1.53e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                       edf Ref.df       F  p-value    
## s(PC1):Periodone   2.3410  2.696   3.283   0.0382 *  
## s(PC1):Periodtwo   1.6029  1.872   1.595   0.2905    
## s(PC1):Periodthree 1.4298  1.706  34.112 4.21e-07 ***
## s(PC1):Periodfour  1.7970  1.968  17.236 1.29e-05 ***
## s(Year)            0.9945  1.000 180.981  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.977   Deviance explained = 98.3%
## -REML = 485.54  Scale est. = 3.0569e+13  n = 33

ME Trawl biomass and abundance

Correlation table

cor_fun("menh_biomass") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
menh_biomass 511 M+F all 0.7092925 -0.0825885 -0.3862958
menh_biomass 511 M+F fall 0.6626761 -0.0295895 -0.4122273
menh_biomass 511 M+F spring 0.7898051 -0.0019561 -0.2528352
menh_biomass 511-513 M+F all 0.7425684 0.0147722 -0.3517762
menh_biomass 511-513 M+F fall 0.6574322 0.0636393 -0.5070726
menh_biomass 511-513 M+F spring 0.7500846 -0.0199554 -0.1108579
menh_biomass 512 M+F all 0.6649572 0.0782806 -0.3365120
menh_biomass 512 M+F fall 0.5729022 0.0865392 -0.5317981
menh_biomass 512 M+F spring 0.5775804 0.0474127 -0.0950820
menh_biomass 513 M+F all 0.7678340 0.0554422 -0.1563695
menh_biomass 513 M+F fall 0.5259902 0.1994919 -0.4151779
menh_biomass 513 M+F spring 0.7936910 -0.3332932 0.2212199
cor_fun("menh_abundance") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
menh_abundance 511 M+F all 0.6984764 -0.0608745 -0.3484765
menh_abundance 511 M+F fall 0.6332350 0.0010156 -0.3893151
menh_abundance 511 M+F spring 0.7780385 -0.0096281 -0.1912699
menh_abundance 511-513 M+F all 0.7397975 -0.0135055 -0.3449486
menh_abundance 511-513 M+F fall 0.6432570 0.0563076 -0.5129313
menh_abundance 511-513 M+F spring 0.7694314 -0.0300206 -0.0727537
menh_abundance 512 M+F all 0.6775895 0.0103388 -0.3516256
menh_abundance 512 M+F fall 0.5924629 0.0413271 -0.5411021
menh_abundance 512 M+F spring 0.6192603 0.0415271 -0.0806528
menh_abundance 513 M+F all 0.7754725 0.0122268 -0.1952224
menh_abundance 513 M+F fall 0.5151820 0.1781658 -0.4580989
menh_abundance 513 M+F spring 0.8323088 -0.3315108 0.2141857

Scatter plot

xyPlot("menh_biomass")

xyPlot("menh_abundance")

AIC lm model selection table

stepwiseAIC("menh_biomass")
## # A tibble: 12 x 7
## # Groups:   stat_area, name, Sex, Season [12]
##    name       stat_area Sex   Season r.squared   p.value model                  
##    <chr>      <chr>     <chr> <chr>      <dbl>     <dbl> <chr>                  
##  1 menh_biom… 511       M+F   fall       0.776   3.01e-6 lob_index ~ Year       
##  2 menh_biom… 511       M+F   spring     0.816   1.66e-6 lob_index ~ Year       
##  3 menh_biom… 512       M+F   fall       0.664   4.87e-4 lob_index ~ Year + PC2 
##  4 menh_biom… 512       M+F   spring     0.530   1.39e-3 lob_index ~ Year       
##  5 menh_biom… 513       M+F   fall       0.544   4.11e-3 lob_index ~ PC2 + Year 
##  6 menh_biom… 513       M+F   spring     0.782   1.21e-3 lob_index ~ PC1 + PC2 …
##  7 menh_biom… 511-513   M+F   all        0.887   2.07e-6 lob_index ~ PC1 + PC2 …
##  8 menh_biom… 511-513   M+F   fall       0.816   7.03e-6 lob_index ~ PC2 + Year 
##  9 menh_biom… 511-513   M+F   spring     0.807   2.24e-5 lob_index ~ PC3 + Year 
## 10 menh_biom… 511       M+F   all        0.889   2.08e-7 lob_index ~ PC1 + Year 
## 11 menh_biom… 512       M+F   all        0.734   9.47e-5 lob_index ~ PC2 + Year 
## 12 menh_biom… 513       M+F   all        0.668   4.47e-4 lob_index ~ PC1 + Year
stepwiseAIC("menh_abundance")
## # A tibble: 12 x 7
## # Groups:   stat_area, name, Sex, Season [12]
##    name         stat_area Sex   Season r.squared    p.value model               
##    <chr>        <chr>     <chr> <chr>      <dbl>      <dbl> <chr>               
##  1 menh_abunda… 511       M+F   fall       0.687    3.92e-5 lob_index ~ Year    
##  2 menh_abunda… 511       M+F   spring     0.783    4.93e-5 lob_index ~ PC3 + Y…
##  3 menh_abunda… 512       M+F   fall       0.683    4.32e-5 lob_index ~ Year    
##  4 menh_abunda… 512       M+F   spring     0.649    1.11e-3 lob_index ~ PC3 + Y…
##  5 menh_abunda… 513       M+F   fall       0.607    1.44e-3 lob_index ~ PC2 + Y…
##  6 menh_abunda… 513       M+F   spring     0.770    7.18e-5 lob_index ~ PC1 + P…
##  7 menh_abunda… 511-513   M+F   all        0.901    8.73e-7 lob_index ~ PC1 + P…
##  8 menh_abunda… 511-513   M+F   fall       0.807    9.93e-6 lob_index ~ PC2 + Y…
##  9 menh_abunda… 511-513   M+F   spring     0.830    9.79e-6 lob_index ~ PC3 + Y…
## 10 menh_abunda… 511       M+F   all        0.824    5.14e-6 lob_index ~ PC1 + Y…
## 11 menh_abunda… 512       M+F   all        0.797    1.44e-6 lob_index ~ Year    
## 12 menh_abunda… 513       M+F   all        0.721    1.31e-4 lob_index ~ PC1 + Y…
aicSelGAM <- aicModelGAM("menh_biomass", 5)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 4.796816 109.1906
## aicSelGAM$gam2[[1]] 5.727764 111.0453
## aicSelGAM$gam3[[1]] 5.966718 112.3889
## aicSelGAM$gam4[[1]] 6.951813 114.3731
summary(aicSelGAM$gam1[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    26.64       2.27   11.74 1.68e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df      F  p-value    
## s(Year):Periodone 1.400      3  1.893   0.0415 *  
## s(Year):Periodtwo 0.989      3 29.978 2.86e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.915   Deviance explained = 92.8%
## -REML = 53.764  Scale est. = 25.613    n = 17
aicSelGAM <- aicModelGAM2("menh_biomass", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 2.999805 120.4227
## aicSelGAM$gam2[[1]] 6.854958 105.3202
## aicSelGAM$gam3[[1]] 8.810623 108.0836
summary(aicSelGAM$gam2[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(PC1, by = Period, k = 4) + s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -4277.6      608.6  -7.028 1.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df      F  p-value    
## s(PC1):Periodone 1.0003  1.001  9.671   0.0092 ** 
## s(PC1):Periodtwo 2.5817  2.855  7.053   0.0039 ** 
## s(Year)          0.9805  1.000 50.159 2.67e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.937   Deviance explained = 95.5%
## -REML = 47.948  Scale est. = 19.054    n = 17
aicSelGAM <- aicModelGAM("menh_abundance", 5)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 4.815287 160.3127
## aicSelGAM$gam2[[1]] 5.821130 161.9939
## aicSelGAM$gam3[[1]] 6.702292 163.7505
## aicSelGAM$gam4[[1]] 6.978343 165.2635
summary(aicSelGAM$gam1[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   110.12      10.41   10.58 6.24e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df      F  p-value    
## s(Year):Periodone 1.447      3  2.815    0.015 *  
## s(Year):Periodtwo 0.990      3 32.402 1.01e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.926   Deviance explained = 93.7%
## -REML = 78.049  Scale est. = 518.87    n = 17
aicSelGAM <- aicModelGAM2("menh_abundance", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 2.999869 171.0800
## aicSelGAM$gam2[[1]] 6.891129 153.4881
## aicSelGAM$gam3[[1]] 8.837939 156.6948
summary(aicSelGAM$gam2[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(PC1, by = Period, k = 4) + s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -21692       2522    -8.6 2.59e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df      F  p-value    
## s(PC1):Periodone 1.0001  1.000  9.083  0.01107 *  
## s(PC1):Periodtwo 2.6398  2.891  9.442  0.00115 ** 
## s(Year)          0.9868  1.000 74.879 6.07e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.954   Deviance explained = 96.7%
## -REML = 68.168  Scale est. = 324.43    n = 17

NEFSC trawl biomass and abundance

Correlation table

cor_fun("nefsc_biomass") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
nefsc_biomass 511 M+F all 0.5605885 0.2029895 -0.0445153
nefsc_biomass 511 M+F fall 0.4379324 0.2859737 -0.0130997
nefsc_biomass 511 M+F spring 0.6049944 -0.0202197 -0.1788490
nefsc_biomass 511-513 M+F all 0.5998806 0.1991287 -0.0205183
nefsc_biomass 511-513 M+F fall 0.5520372 0.2659903 -0.0991014
nefsc_biomass 511-513 M+F spring 0.6244460 0.0687025 0.0138216
nefsc_biomass 512 M+F all 0.5560883 0.1882706 -0.0982838
nefsc_biomass 512 M+F fall 0.4681876 0.2385343 -0.2033314
nefsc_biomass 512 M+F spring 0.5937683 0.0725603 -0.0167400
nefsc_biomass 513 M+F all 0.5953150 0.1726058 0.2866820
nefsc_biomass 513 M+F fall 0.5115360 0.1705985 0.2477102
nefsc_biomass 513 M+F spring 0.5863728 -0.0021220 0.2589758
cor_fun("nefsc_abundance") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
nefsc_abundance 511 M+F all 0.6155917 0.1308479 -0.0986373
nefsc_abundance 511 M+F fall 0.5497522 0.2146793 -0.0443182
nefsc_abundance 511 M+F spring 0.6237567 -0.0345446 -0.2117479
nefsc_abundance 511-513 M+F all 0.6209599 0.1595062 -0.1009855
nefsc_abundance 511-513 M+F fall 0.5708431 0.2282160 -0.1729612
nefsc_abundance 511-513 M+F spring 0.6426839 0.0312209 -0.0688691
nefsc_abundance 512 M+F all 0.5779740 0.1480157 -0.1753663
nefsc_abundance 512 M+F fall 0.5109276 0.2094406 -0.2608596
nefsc_abundance 512 M+F spring 0.6048870 0.0323702 -0.1213157
nefsc_abundance 513 M+F all 0.5907205 0.1640788 0.3535189
nefsc_abundance 513 M+F fall 0.5392136 0.1884773 0.2554866
nefsc_abundance 513 M+F spring 0.5101000 -0.0365485 0.4001965

Scatter plot

xyPlot("nefsc_biomass")

xyPlot("nefsc_abundance")

AIC lm model selection table

stepwiseAIC("nefsc_biomass")
## # A tibble: 12 x 7
## # Groups:   stat_area, name, Sex, Season [12]
##    name       stat_area Sex   Season r.squared     p.value model                
##    <chr>      <chr>     <chr> <chr>      <dbl>       <dbl> <chr>                
##  1 nefsc_bio… 511       M+F   all        0.714     8.98e-8 lob_index ~ PC1 + PC…
##  2 nefsc_bio… 512       M+F   all        0.724     3.02e-8 lob_index ~ PC1 + PC…
##  3 nefsc_bio… 513       M+F   all        0.722     3.26e-8 lob_index ~ PC1 + PC…
##  4 nefsc_bio… 511       M+F   spring     0.812     1.01e-7 lob_index ~ PC1 + PC…
##  5 nefsc_bio… 512       M+F   spring     0.696     2.13e-6 lob_index ~ PC1 + PC…
##  6 nefsc_bio… 513       M+F   spring     0.798     3.74e-8 lob_index ~ PC1 + PC…
##  7 nefsc_bio… 512       M+F   fall       0.660     9.98e-7 lob_index ~ PC1 + PC…
##  8 nefsc_bio… 513       M+F   fall       0.457     1.93e-4 lob_index ~ PC1 + Ye…
##  9 nefsc_bio… 511       M+F   fall       0.430     6.74e-4 lob_index ~ PC1 + Ye…
## 10 nefsc_bio… 511-513   M+F   all        0.801     1.80e-9 lob_index ~ PC1 + PC…
## 11 nefsc_bio… 511-513   M+F   spring     0.803     8.10e-8 lob_index ~ PC1 + PC…
## 12 nefsc_bio… 511-513   M+F   fall       0.753     1.19e-8 lob_index ~ PC1 + PC…
stepwiseAIC("nefsc_abundance")
## # A tibble: 12 x 7
## # Groups:   stat_area, name, Sex, Season [12]
##    name        stat_area Sex   Season r.squared    p.value model                
##    <chr>       <chr>     <chr> <chr>      <dbl>      <dbl> <chr>                
##  1 nefsc_abun… 511       M+F   all        0.650    1.47e-6 lob_index ~ PC1 + PC…
##  2 nefsc_abun… 512       M+F   all        0.714    2.66e-7 lob_index ~ PC1 + PC…
##  3 nefsc_abun… 513       M+F   all        0.705    7.73e-8 lob_index ~ PC1 + PC…
##  4 nefsc_abun… 511       M+F   spring     0.667    1.04e-5 lob_index ~ PC1 + PC…
##  5 nefsc_abun… 512       M+F   spring     0.717    4.65e-6 lob_index ~ PC1 + PC…
##  6 nefsc_abun… 513       M+F   spring     0.637    2.77e-5 lob_index ~ PC1 + PC…
##  7 nefsc_abun… 512       M+F   fall       0.655    1.22e-6 lob_index ~ PC1 + PC…
##  8 nefsc_abun… 513       M+F   fall       0.479    1.08e-4 lob_index ~ PC1 + Ye…
##  9 nefsc_abun… 511       M+F   fall       0.480    2.05e-4 lob_index ~ PC1 + Ye…
## 10 nefsc_abun… 511-513   M+F   all        0.765    1.83e-8 lob_index ~ PC1 + PC…
## 11 nefsc_abun… 511-513   M+F   spring     0.792    1.48e-7 lob_index ~ PC1 + PC…
## 12 nefsc_abun… 511-513   M+F   fall       0.698    1.89e-7 lob_index ~ PC1 + PC…
aicSelGAM <- aicModelGAM("nefsc_biomass", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 6.962757 555.1162
## aicSelGAM$gam2[[1]] 7.944333 556.4525
## aicSelGAM$gam3[[1]] 8.898721 558.4571
## aicSelGAM$gam4[[1]] 9.894353 560.3463
summary(aicSelGAM$gam1[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1939.5      241.7   8.024 1.17e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                       edf Ref.df      F  p-value    
## s(Year):Periodone   1.716      3  10.75 4.15e-06 ***
## s(Year):Periodtwo   1.991      3 146.18  < 2e-16 ***
## s(Year):Periodthree 0.989      1  90.22 7.08e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.949   Deviance explained = 95.7%
## -REML = 277.17  Scale est. = 9.3784e+05  n = 33
aicSelGAM <- aicModelGAM2("nefsc_biomass", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]])
##                            df      AIC
## aicSelGAM$gam1[[1]]  2.999609 618.9496
## aicSelGAM$gam2[[1]]  7.966732 588.9560
## aicSelGAM$gam3[[1]] 11.074790 551.9897
summary(aicSelGAM$gam3[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(PC1, by = Period, k = 4) + s(PC2, by = Period, 
##     k = 4) + s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -276764      53201  -5.202 2.76e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                          edf    Ref.df      F  p-value    
## s(PC1):Periodone   1.002e+00 1.004e+00  0.093 0.765624    
## s(PC1):Periodtwo   2.959e+00 2.995e+00 53.221  < 2e-16 ***
## s(PC1):Periodthree 1.000e+00 1.000e+00 21.092 0.000126 ***
## s(PC2):Periodone   1.000e+00 1.000e+00  0.015 0.902103    
## s(PC2):Periodtwo   1.880e+00 2.079e+00 28.283 1.67e-07 ***
## s(PC2):Periodthree 5.130e-17 5.130e-17  0.000 1.000000    
## s(Year)            9.632e-01 1.000e+00 27.363 8.05e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Rank: 19/20
## R-sq.(adj) =  0.958   Deviance explained = 96.9%
## -REML =  228.2  Scale est. = 7.8263e+05  n = 33
aicSelGAM <- aicModelGAM("nefsc_abundance", 5)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 5.578562 1107.887
## aicSelGAM$gam2[[1]] 6.500238 1109.795
## aicSelGAM$gam3[[1]] 7.056761 1110.649
## aicSelGAM$gam4[[1]] 8.924649 1103.663
summary(aicSelGAM$gam4[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ PC1 + PC2 + PC3 + s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -1197110    2087997  -0.573   0.5715  
## PC1           372067     662998   0.561   0.5796  
## PC2          -585073     945405  -0.619   0.5415  
## PC3         -2190695     806518  -2.716   0.0117 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                           edf Ref.df      F p-value    
## s(Year):Periodone   0.0002572      3  0.000 0.74166    
## s(Year):Periodtwo   1.6285285      3  3.207 0.00793 ** 
## s(Year):Periodthree 1.9446943      3 27.469 6.4e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.883   Deviance explained = 90.7%
## -REML = 494.72  Scale est. = 1.4809e+13  n = 33
aicSelGAM <- aicModelGAM2("nefsc_abundance", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]])
##                            df      AIC
## aicSelGAM$gam1[[1]]  2.999094 1146.319
## aicSelGAM$gam2[[1]]  8.872693 1119.656
## aicSelGAM$gam3[[1]] 11.341005 1110.978
summary(aicSelGAM$gam3[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(PC1, by = Period, k = 4) + s(PC2, by = Period, 
##     k = 4) + s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  3099570    1927474   1.608    0.121
## 
## Approximate significance of smooth terms:
##                          edf Ref.df      F  p-value    
## s(PC1):Periodone   1.000e+00  1.000  0.015    0.905    
## s(PC1):Periodtwo   1.000e+00  1.001  0.074    0.787    
## s(PC1):Periodthree 2.797e+00  2.950 29.597 2.52e-09 ***
## s(PC2):Periodone   1.000e+00  1.000  1.341    0.259    
## s(PC2):Periodtwo   1.000e+00  1.000  0.001    0.974    
## s(PC2):Periodthree 2.140e+00  2.390 14.969 5.66e-05 ***
## s(Year)            1.252e-05  1.000  0.000    0.168    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.861   Deviance explained =   90%
## -REML = 445.32  Scale est. = 1.7603e+13  n = 33

NEFSC ASMFC index

Correlation table

cor_fun("NEFSCindex") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
NEFSCindex 511-513 Fem all 0.6605072 0.2298760 0.0734634
NEFSCindex 511-513 Fem fall 0.6450399 0.2567705 0.0264108
NEFSCindex 511-513 Fem spring 0.6487225 0.1808536 0.1352419
NEFSCindex 511-513 M+F all 0.6701265 0.2453494 0.0658980
NEFSCindex 511-513 M+F fall 0.6266569 0.2674609 0.0144243
NEFSCindex 511-513 M+F spring 0.6781325 0.1966688 0.1307461
NEFSCindex 511-513 Mal all 0.6722428 0.2653764 0.0523957
NEFSCindex 511-513 Mal fall 0.5812165 0.2776790 -0.0050943
NEFSCindex 511-513 Mal spring 0.6963721 0.2127369 0.1192236

Scatter plot

xyPlot("NEFSCindex")

AIC lm model selection table

stepwiseAIC("NEFSCindex")
## # A tibble: 9 x 7
## # Groups:   stat_area, name, Sex, Season [9]
##   name       stat_area Sex   Season r.squared  p.value model                    
##   <chr>      <chr>     <chr> <chr>      <dbl>    <dbl> <chr>                    
## 1 NEFSCindex 511-513   Fem   spring     0.866 9.27e-13 lob_index ~ PC1 + PC2 + …
## 2 NEFSCindex 511-513   Fem   fall       0.752 6.38e- 9 lob_index ~ PC1 + PC3 + …
## 3 NEFSCindex 511-513   Mal   spring     0.801 3.12e-11 lob_index ~ PC1 + Year   
## 4 NEFSCindex 511-513   Mal   fall       0.549 6.48e- 6 lob_index ~ PC1 + Year   
## 5 NEFSCindex 511-513   Fem   all        0.816 9.51e-12 lob_index ~ PC1 + Year   
## 6 NEFSCindex 511-513   Mal   all        0.740 1.67e- 9 lob_index ~ PC1 + Year   
## 7 NEFSCindex 511-513   M+F   fall       0.674 5.03e- 8 lob_index ~ PC1 + Year   
## 8 NEFSCindex 511-513   M+F   spring     0.862 1.42e-12 lob_index ~ PC1 + PC2 + …
## 9 NEFSCindex 511-513   M+F   all        0.797 3.96e-11 lob_index ~ PC1 + Year
aicSelGAM <- aicModelGAM("NEFSCindex", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 5.995946 67.70897
## aicSelGAM$gam2[[1]] 6.848717 67.23841
## aicSelGAM$gam3[[1]] 6.991026 68.13149
## aicSelGAM$gam4[[1]] 7.984247 69.42586
summary(aicSelGAM$gam2[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ PC1 + s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.0522     0.1693  12.121 1.48e-12 ***
## PC1           0.1739     0.1091   1.594    0.122    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df      F  p-value    
## s(Year):Periodone 1.510      3  8.162 2.36e-05 ***
## s(Year):Periodtwo 1.943      3 41.518 3.58e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.927   Deviance explained = 93.7%
## -REML = 37.894  Scale est. = 0.35529   n = 33
aicSelGAM <- aicModelGAM2("NEFSCindex", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]])
##                            df      AIC
## aicSelGAM$gam1[[1]]  2.999551 120.4194
## aicSelGAM$gam2[[1]]  6.879486  79.4312
## aicSelGAM$gam3[[1]] 10.900799  56.8489
summary(aicSelGAM$gam3[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(PC1, by = Period, k = 4) + s(PC2, by = Period, 
##     k = 4) + s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -126.6       29.3  -4.323 0.000236 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df      F  p-value    
## s(PC1):Periodone 1.0000  1.000  3.685 0.066776 .  
## s(PC1):Periodtwo 1.8065  2.102  1.246 0.293584    
## s(PC2):Periodone 1.5060  1.822  1.639 0.308692    
## s(PC2):Periodtwo 2.9083  2.981 15.163 7.04e-06 ***
## s(Year)          0.9505  1.000 19.222 0.000103 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.952   Deviance explained = 96.4%
## -REML = 32.162  Scale est. = 0.23452   n = 33

ME ASMFC index

Correlation table

cor_fun("MEindex") %>% 
  knitr::kable()
name stat_area Sex Season corPC1 corPC2 corPC3
MEindex 511-513 Fem all 0.7856914 0.2804917 -0.2132999
MEindex 511-513 Fem fall 0.6863246 0.3868473 -0.3913854
MEindex 511-513 Fem spring 0.8172492 0.1221815 0.0233115
MEindex 511-513 M+F all 0.7975967 0.2732375 -0.2059632
MEindex 511-513 M+F fall 0.6914666 0.3834677 -0.3849494
MEindex 511-513 M+F spring 0.8323113 0.1144196 0.0254285
MEindex 511-513 Mal all 0.8079054 0.2655958 -0.1983608
MEindex 511-513 Mal fall 0.6940054 0.3786066 -0.3770079
MEindex 511-513 Mal spring 0.8452919 0.1067958 0.0274104

Scatter plot

xyPlot("MEindex")

AIC lm model selection table

stepwiseAIC("MEindex")
## # A tibble: 9 x 7
## # Groups:   stat_area, name, Sex, Season [9]
##   name    stat_area Sex   Season r.squared     p.value model                 
##   <chr>   <chr>     <chr> <chr>      <dbl>       <dbl> <chr>                 
## 1 MEindex 511-513   Fem   spring     0.874 0.0000110   lob_index ~ PC2 + Year
## 2 MEindex 511-513   Fem   fall       0.846 0.0000335   lob_index ~ PC3 + Year
## 3 MEindex 511-513   Mal   spring     0.906 0.00000226  lob_index ~ PC2 + Year
## 4 MEindex 511-513   Mal   fall       0.816 0.0000901   lob_index ~ PC3 + Year
## 5 MEindex 511-513   Fem   all        0.897 0.000000286 lob_index ~ Year      
## 6 MEindex 511-513   Mal   all        0.900 0.000000232 lob_index ~ Year      
## 7 MEindex 511-513   M+F   fall       0.834 0.0000507   lob_index ~ PC3 + Year
## 8 MEindex 511-513   M+F   spring     0.892 0.00000483  lob_index ~ PC2 + Year
## 9 MEindex 511-513   M+F   all        0.900 0.000000234 lob_index ~ Year
aicSelGAM <- aicModelGAM("MEindex", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]],aicSelGAM$gam3[[1]],aicSelGAM$gam4[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 4.996078 108.8532
## aicSelGAM$gam2[[1]] 5.963635 110.7742
## aicSelGAM$gam3[[1]] 5.843491 116.8753
## aicSelGAM$gam4[[1]] 6.772651 119.0070
summary(aicSelGAM$gam1[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(Year, by = Period, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   58.048      3.217   18.04  5.1e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df     F  p-value    
## s(Year):Periodone   0.9747      3 12.82 3.53e-05 ***
## s(Year):Periodtwo   0.9640      2 13.39 0.000232 ***
## s(Year):Periodthree 0.9557      1 21.57 0.000587 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.887   Deviance explained = 91.2%
## -REML = 54.427  Scale est. = 94.594    n = 14
aicSelGAM <- aicModelGAM2("MEindex", 4)
AIC(aicSelGAM$gam1[[1]], aicSelGAM$gam2[[1]])
##                           df      AIC
## aicSelGAM$gam1[[1]] 2.999915 106.7065
## aicSelGAM$gam2[[1]] 5.984488 110.1648
summary(aicSelGAM$gam1[[1]])
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lob_index ~ s(Year, bs = "re", k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -13023       1264   -10.3 2.57e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df     F p-value    
## s(Year) 0.9908      1 107.2 2.8e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.892   Deviance explained =   90%
## -REML = 51.415  Scale est. = 90.816    n = 14

Cluster Analysis

Cluster analysis of the lobster biological data.

lobData <- all_lob_data %>%  
  pivot_wider(names_from = name, values_from = lob_index)

allIndex_scaled <- lobData %>%
  filter(stat_area == "511-513",
         Season == "all") %>% 
  select(-Season, -stat_area, -Sex) %>% 
  na.omit() %>% 
  column_to_rownames(., var = "Year") %>% 
  scale() 

allIndex_wss <- allIndex_scaled %>% 
  clusfun()

# look for break in plot like a scree plot

kmeans_scree <- ggplot() + 
  geom_line(aes(x = c(1:12), y = allIndex_wss)) +
  labs(x = "Number of Clusters",
  y ="Within groups sum of squares")

kmeans_scree

# K-Means Cluster Analysis
fit <- allIndex_scaled %>% kmeans(3)

fviz_cluster(object = fit, data = allIndex_scaled) + 
  theme_bw()